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Abstract — Hoeffding's U-statistics model combinatorial-type 
matrix parameters (appearing in CS theory) in a natural way. 
This paper proposes using these statistics for analyzing random 
compressed sensing matrices, in the non-asymptotic regime 
(relevant to practice). The aim is to address certain pessimisms 
of "worst-case" restricted isometry analyses, as observed by both 
Blanchard & Dossal, et. al. 

We show how U-statistics can obtain "average-case" analyses, 
by relating to statistical restricted isometry property (StRIP) type 
recovery guarantees. However unlike standard StRIP, random 
signal models are not required; the analysis here holds in the al- 
most sure (probabilistic) sense. For Gaussian/bounded entry ma- 
trices, we show that both -minimization and LASSO essentially 
require on the order of k ■ [log((n — k)/u) + ^2{k/n) log(n/A:)] 
measurements to respectively recover at least 1 5ii fraction, and 
1 — 4ii fraction, of the signals. Noisy conditions are considered. 
Empirical evidence suggests our analysis to compare well to 
Donoho & Tanner's recent large deviation bounds for Iq/(.i- 
equivalence, in the regime of block lengths 1000 ~ 3000 with 
high undersampling (50 ~ 150 measurements); similar system 
sizes are found in recent CS implementation. 

In this work, it is assumed throughout that matrix columns 
are independently sampled. 

Index Terms — approximation, compressed sensing, satistics, 
random matrices 



I. Introduction 

Compressed sensing (CS) analysis involves relatively recent 
results from random matrix theory |[T|, whereby recovery guar- 
antees are framed in the context of matrix parameters known as 
restricted isometry constants. Other matrix parameters are also 
often studied in CS. Earlier work on sparse approximation con- 
sidered a matrix parameter known as mutual coherence 

Fuchs' work on Karush-Kuhn-Tucker (KKT) conditions 
for sparsity pattern recovery considered a parameter involving 
a matrix pseudoinverse f5\, re-occurring in recent work |4l, 
||6i , [7|. Finally, the null-space property L8J-L10| is gaining 
recent popularity - being the parameter closest related to the 
fundamental compression limit dictated by Gel' f and widths. 
All above parameters share a similar feature, that is they 
are defined over subsets of a certain fixed size k. This 
combinatorial nature makes them difficult to evaluate, even 
for moderate block lengths n. Most CS work therefore involve 
some form of randomization to help the analysis. 

While the celebrated fclog(n/A:) result was initially ap- 
proached via asymptotics, e.g., [IJ, ITTI - lfTSll . implementations 
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require finite block sizes. Hence, non-asymptotic analyses are 
more application relevant. In the same practical aspect, recent 
work deals with non-asymptotic analysis of deterministic 
CS matrices, see pl, Q, IH, fBl. On the other hand 
certain situations may not allow control over the sampling 
process, whereby the samphng may be inherently random, e.g., 
prediction of clinical outcomes of various tumors based on 
gene expressions |6|. Random sampling has certain desirable 
simplicity/efficiency features - see ifTSI on data acquisition in 
the distributed sensor setting. Also recent hardware imple- 
mentations point out energy/complexity-cost benefits of im- 
plementing pseudo-random binary sequences II17I - 019I ; these 
sequences mimic statistical behavior Non-asymptotic analysis 
is particularly valuable, when random samples are costly to 
acquire. For example, each clinical trial could be expensive 
to conduct an excessive number of times. In the systems 
setting, the application could be running on a tight energy 
budget - whereby processing/communication costs depend on 
the number of samples acquired. 

This work is inspired by the statistical notion of the 
restricted isometry property (StRIP), initially developed for 
deterministic CS analysis |14|, |15|. The idea is to relax the 
analysis, by allowing sampling matrix parameters (that guar- 
antee signal recovery) to be satisfied for a. fraction of subsets. 
Our interest is in "average-case" notions in the context of 
randomized sampling, reason being that certain pessimisms of 
"worst-case" restricted isometry analyses have been observed 
in past works |13|, |20l, flV\. On the other hand in |22|, 
Donoho & Tanner remarked on potential benefits of the above 
"average-case" notion, recently pursued in an adaptation of a 
previous asymptotic result [23]. In the multichannel setting, 
"average-case" notions are employed to make analysis more 
tractable ll24l . Il25l . In ll26l a simple "thresholding" algorithm 
is analyzed via an "average" coherence parameter However 
the works in this respect are few, most random analyses are of 
the "worst-case" type, see lfT2l . ifl^l . II2TI . ll27l . We investigate 
the unexplored, with the aim of providing new insights and 
obtaining new/improved results for the "average-case". 

Here we consider a random analysis tool that is well- 
suited to the CS context, yet seemingly left untouched in 
the literature. Our approach differs from that of deterministic 
matrices, where "average-case" analysis is typically made 
accessible via mutual coherence, see [l4l, ifTSll . ifTSl . For 
random matrices, we propose an alternative approach via U- 
statistics, which do not require random signal models typically 
introduced in StRIP analysis, see 1. 14 J . [25 J . |26|; here, the 
results are stated in the almost sure sense. U-statistics apply 
naturally to various kinds of non-asymptotic CS analyses. 
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since they are designed for combinatorial- type parameters. 
Also, they have a natural "average-case" interpretation, which 
we apply to recent recovery guarantees that share the same 
"average-case" characteristic. Finally thanks to the wealth of 
U-statistical literature, the theory developed here is open to 
other extensions, e.g., in related work |28| we demonstrate 
how U-statistics may also perform "worst-case" analysis. 

Contributions: "Average-case" analyses are developed 
based on U-statistics, which are i) empirically observed to have 
good potential for predicting CS recovery in non-asymptotic 
regimes, and ii) theoretically obtain measurement rates that 
incorporate a non-zero failure rate (similar to the k\og{n/k) 
rate from "worst-case" analyses). We utilize a U-statistical 
large deviation concentration theorem, under the assumption 
that the matrix columns are independently sampled. The large 
deviation error bound holds almost surely (Theorem [T]i- No 
random signal model is needed, and the error is of the 
order (n/k)^^ \og{n/k), whereby k is the U-statistic kernel 
size (and k also equals sparsity level). Gaussian/bounded 
entry matrices are considered. For concreteness, we con- 
nect with StRlP-type guarantees (from Q) to study the 
fraction of recoverable signals {i.e., "average-case" recov- 
ery) of: i) £i-minimization and ii) least absolute shrinkage 
and selection operator (LASSO), under noisy conditions. For 
both these algorithms we show const -A; [log ((n — k)/u) + 
y/2{k/n) log(n/fc)] measurements are essentially required, to 
respectively recover at least 1 — 5u fraction (Theorem |2|, 
and 1 — 4m fraction (Theorem O, of possible signals. This 
is improved to 1 — 3u fraction for the noiseless case. Here 
const = max(4/(aia2)^, 2ci/(0.29 — fli)^) for to be specified 
constants ai,a2,ci, where ci depends on the distribution of 
matrix entries. Note that the term y/2{k/n) log(n/fc) is at 
most 1 and vanishes with small k/n. Empirical evidence 
suggests that our approach compares well with recent results 
from Donoho & Tanner |23| - improvement is suggested 
for system sizes found in implementations fTTI, with large 
undersampling (i.e., m = 50 - 100 and n = 1000 - 3000). 
The large deviation analysis here does show some pessimism 
in the size of const above, whereby const > 4 (we conjecture 
possible improvement). For Gaussian/Bernoulli matrices, we 
find const w 1.8 to be inherently smaller, e.g., for fc = 4 this 
predicts recovery of 1 x 10~^ fraction with 153 measurements 
- empirically m = 150. 

Note: StRJP-type guarantees f6\, fT| seem to work well, 
by simply not placing restrictive conditions on the maximum 
eigenvalues of the size-k submatrices. Our theory applies fairly 
well for various considered system sizes k,m,n (e.g.. Figure 
m, however in noisy situations, a (relatively small) factor of 
\/k losses is seen without making certain maximum eigenvalue 
assumptions. For £i-recovery, the estimation error is now 
bounded by a Vk factor of its best fc-term approximation error 
(both errors measured using the ^i-norm). For LASSO, the the 
non-zero signal magnitudes must now be bounded below by 
a factor ^/2fc logn (with respect to noise standard deviation), 
as opposed to -y/2 log n in [6|. These losses occur not because 
of StRIP analyses, but because of the estimation techniques 
employed here. 

Organization: We begin with relevant background on CS 



in Section In Section |III] we present a general U-statistical 
theorem for large-deviation ("average-case") behavior. In Sec- 
tion HV] the U-statistical machinery is applied to StRlP-type 
"average-case" recovery. We conclude in Section [V] 

Notation: The set of real numbers is denoted R. Determin- 
istic quantities are denoted using a, a, or A, where bold fonts 
denote vectors (i.e., a) or matrices (i.e.. A). Random quantities 
are denoted using upper-case italics, where A is a random 
variable (RV), and A a random vector/matrix. Let Pr{yl < a} 
denote the probability that event {A < a} occurs. Sets are 
denoted using braces, e.g., {1, 2, • • • }. The notation E denotes 
expectation. The notation i, j, £, uj is used for indexing. We let 
1 1 • I Ip denote the £p-norm for p = 1 and 2. 

II. Preliminaries 
A. Compressed Sensing ( CS) Theory 

A vector a is said to be fc-sparse, if at most k vector coef- 
ficients are non-zero (i.e., its ^p-distance satisfies ||a||o < k). 
Let n be a positive integer that denotes block length, and let 
a = [ai,a2, ■ ■ ■ ,ctn]'^ denote a length-n signal vector with 
signal coefficients a^. The best k-term approximation oik of a, 
is obtained by finding the fc-sparse vector oik that has minimal 
approximation error jjafc — q||2- 

Let $ denote an m x n CS sampling matrix, where 
m < n. The length-m measurement vector denoted b = 
[6i,62,''' ,bm]^ of some length-n signal a, is formed as 
b — $a. Recovering a from b is challenging as $ possesses 
a non-trivial null-space. We typically recover a by solving the 
(convex) -minimization problem 

min ||a||i s. t. lib — ^alb < £■ (1) 

The vector b is a noisy version of the original measurements 
b, and here e bounds the noise error, i.e., e > ||b — b||2. 
Recovery conditions have been considered in many flavors (2\, 
lEI, ifTTl . 1221 . |l23|, and mostly rely on studying parameters 
of the sampling matrix 

For k < n, the fc-th restricted isometry constant 5k of an 
m X n matrix equals the smallest constant that satisfies 

(l-4)||a||^<||#a||^<(l + 4)||a||^, (2) 

for any fc-sparse a in R". The following well-known recovery 
guarantee is stated w.rt. 6k in (|2|l. 

Theorem A, c./., ||29| Let $ be the sensing matrix. Let a 
denote the signal vector Let b be the measurements, i.e., b = 
$a. Assume that the {2k)-th restricted isometry constant 62k 
of^ satisfies 52k < V2 ^ 1, and further assume that the noisy 
version b 0/ b satisfies ||b — b||2 < e. Let oik denote the 
best-k approximation to a. Then the li-minimum solution a* 
to (O satisfies 

- a||i < Cilia - afelli + C2e, 

/or small constants Ci = 4\/l + 52k/ — 52k{^ + V^)) <^nd 
C2 = 2(<52fe(l - v/2) - l)/{52k{l + \/2) - 1). 

Theorem A is very powerful, on condition that we know the 
constants 5k. But because of their combinatoric nature, com- 
puting the restricted isometry constants 5k is NP-Hard ifTSl . 
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Let S denote a size-fc subset of indices. Let $5 denote the size 
mx k submatrix of indexed on (column indices) in S. Let 
"'maxims) ™d cr^i„($5) respectively denote the minimum and 
maximum, squared-singular values of $5. Then from (|2|i if the 
columns c^; of $ are properly normalized, i.e., if ||<^i||2 — 1, 
we deduce that 5k is the smallest constant in IR that satisfies 



4 > max(CT^3^($5) - 1,1 



(3) 



for all (^) size-fc subsets S. For large n, the number (^) is 
huge. Fortunately 5k need not be explicitly computed, if we 
can estimate it after incorporating randomization [11|. 

Recovery guarantee Theorem A involves "worst-case" anal- 
ysis. If the inequality (|3]l is violated for any one submatrix $5, 
then the whole matrix $ is deemed to have restricted isometry 
constant larger than 5k. A common complaint of such "worst- 
case" analyses is pessimism, e.g., in [EOl it is found that for 
n = 4000 and m = 1000, the restricted isometry property 
is not even satisfied for sparsity fc = 5. This motivates the 
"average-case" analysis investigated here, where the recovery 
guarantee is relaxed to hold for a large "fraction" of signals 
(useful in applications that do not demand all possible signals 
to be completely recovered). We draw ideas from the statistical 
StRIP notion used in deterministic CS, which only require 
"most" of the submatrices $5 to satisfy some properties. 

In statistics, a well-known notion of a U-statistic (introduced 
in the next subsection) is very similar to StRIR We will show 
how U-statistics naturally lead to "average-case" analysis. 

B. U-statistics & StRIP 

A function C : R™^*^ ^ IR is said to be a kernel, if for 
any A, A' e Rmx*:^ we have ^(A) — C(A') if matrix A' can 
be obtained from A by column reordering. Let IR[o,i] be the 
set of real numbers bounded below by and above by 1, i.e., 
^[0.1] = {a e IR : < a < 1}. U-statistics are associated 
with functions g : IR'"^'^' x IR IR[o,i] known as bounded 
kernels. To obtain bounded kernels g from indicator functions, 
simply use some kernel ( and set g{A, a) = 1 {C(A) < a} or 
g{A,a) = 1{C(A) > a}, e.g. l{al,^iA) < a}. 

Definition 1 (Bounded Kernel U-Statistics). Let A be a 
random matrix with n columns. Let $ be sampled as ^ = A. 
Let g : IR'^x*^ x IR i— > IR[o,i] be a bounded kernel. For any 
a S IR, the following quantity 

A 1 



Un{a) 



ik) s 



5(*5,a) 



(4) 



is a U-statistic of the sampled realization ^ = A, correspond- 
ing to the kernel g. In (13, the matrix $5 is the submatrix of 
$ indexed on column indices in S, and the sum takes place 
over all subsets S in {1, 2, • • • , n}. Note, < Un{a) < 1. 

For k < n and positive u where it < 1, a matrix $ has 
w-StRIP constant 5k, if 5k is the smallest constant s.t. 

{l~5k)\\a\\l < W^saWl < {l + 5k)\\a\\l (5) 

for any a G R*^ and fraction u of size-fc subsets S. The 
difference between ^ and ^ is that $5 is in place of 
This StRIP notion coincides with |7|. Consider ({A) — 
max(o-|^^^(A) — 1,1 — o'^in(A)) where here C is a kernel. 



Obtain a bounded kernel g by setting g{A,a) = 1{C(A) > 
a}. Construct a U-statistic Un{5) of $ the form Un{5) = 
{iy^J2sHC{^s) > (>}. Then if this U-statistic satisfies 
Un{5) = I — u, the w-StRIP constant 5k of $ is at most 
5, i.e., 5k < 5. 

To exploit apparent similarities between U-statistics and 
StRIP, we turn to two "average-case" guarantees found in 
the StRIP literature. In the sequel, the conditions required 
by these two guarantees, will be analyzed in detail via U- 
statistics - for now let us recap these guarantees. First, an 
£1 -minimization recovery guarantee recently given in f7 |, is a 
StRIP-adapted version of the "worst-case" guarantee Theorem 
A. For any non-square matrix A, let At denote the Moore- 
Penrose pseudoinversfl A vector with entries in { — 1, 1} is 
termed a sign vector. For a G IR", we write ag for the length- 
k vector supported on S. Let Sc denote the complementary 
set of S, i.e., Sc = {1,2,--- ,71} \ S. The "average-case" 
guarantees require us to check conditions on $ for fractions 
of subsets S, or sign-subset pairs (/3,5). 

Theorem B, c.f., Lemma 3, ^ Let ^ be an mx n sensing 
matrix. Let S be a size-k subset, and let G { — 1, l}*"- Assume 
that $ satisfies 

• invertibility: for at least a fraction \ — ui of subsets S, 
the condition (Tmin(^5) > holds. 

• small projections: for at least a fraction \ — U2 of sign- 
subset pairs (P, S), the condition 



{'^%4)if^ < 02 for every i ^ S 

holds where we assume the constant 02 < 1. 
worst-case projections: for at least a fraction 1 
subsets S, the following condition holds 

ll^Wdli < for every i i S. 



"3 of 



^5Vi||i < a?, for every i 

Then for a fraction 1 — ui — U2 — U3 of sign-subset pairs {^,S), 
the following error bounds are satisfied 

2a3 



a. 



-0:5 1 



< 



< 



1-02 

2 



1-02 

where a is a signal vector that satisfies sgn(a5) ~ /8, and oik 
is the best-k approximation of a and oik is supported on S, 
and finally a* is the solution to (|7]) where the measurements 
b satisfy b = $a. 

For convenience, the proof is provided in Supplementary 
Material [A] The second guarantee is a StRIP-type recovery 
guarantee for the LASSO estimate, based on ||6| (also see Q). 
Consider recovery from noisy measurements 

b — $a + z, 

here z is a length-m noise realization vector We assume that 
the entries Zi of z, are sampled from a zero-mean Gaussian 
distribution with variance c|. The LASSO estimate considered 
in [6J, is the optimal solution a* of the optimization problem 



mm -lib — $d||2 + 2cz 

deiR" 2 



a 1- 



'if A has full column rank, then At = (A^A) '^A^ 
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The ^i-regularization parameter is chosen as a product of two 
terms cz and 6'„, where we specify On ~ (1 + a)\/2\ogn 
for some positive a. What differs from convention is that the 
regularization depends on the noise standard deviation cz ■ We 
assume cz > 0, otherwise there will be no €i -regularization. 

Theorem C, c.f., f&l Let $ be the mxn sensing matrix. Let 
S be a size-k subset, and let G {^1, l}*^- 

• invertability: for at least a fraction 1 — ui of subsets S, 
the condition Ummi'^s) > holds. 

• small projections: for at least a fraction I — U2 of subsets 
S, same as Theorem B. 

• invertability projections: for at least a fraction I — of 
sign-subset pairs (/8,iS), the following condition holds 

||(*^*5)-^^||oo<a3. 

Let Cz denote noise standard deviation. Assume Gaussian 
noise realization z in measurements b, satisfy 

i) ||(*;^$5)"^*5z||oo < (czV21ogn)/ai, for the con- 
stant ai in the invertability condition. 

ii) 11*5^1 - *5$5)z||oo < cz2^J\ogn, where Sc is the 
complementary set of S. 

For some positive a, assume that constant 02 in the small 
projections condition, satisfies 

(%/2(l + a))-i +a2 < 1. (7) 

Then for a fraction 1 — wi — U2 — W3 of sign-subset pairs 
{P,S), the LASSO estimate a* from (|6| with regularization 
Sn — (1 + ci)y/2logn for the same a above, will successfully 
recover both signs and supports of a, if 

|a,| > [aj-i + 203(1 + a)] ■ cz \/2logn for all ieS (8) 

Because of some differences from [6], we also provide 
the proof in Supplementary Material |A] In I6j| it is shown 
that the noise conditions i) and ii) are satisfied with large 
probabiHty at least 1 — rt^^(27rlogn)~2 (see Proposition]?] 
in Supplementary Material ]A]i. Theorem C is often referred 
to as a sparsity pattern recovery result, in the sense that it 
guarantees recovery of the sign-subset pairs (/?, S) belonging 
to a A:-sparse signal a. Fuchs established some of the earlier 
important results, see |]5], ll30l . 1,3 1 J ■ 

In Theorems B and C, observe that the invertability con- 
dition can be easily checked using an U-statistic; simply set 
the bounded kernel g as g{A,ai) — l{CTrnin(A) < oi} for 
some positive ai and measure the fraction [/„(ai) = ui. Other 
conditions require slightly different kernels, to be addressed in 
upcoming Section ]IV] But first we first introduce the main U- 
statistical large deviations theorem (central to our analyses) in 
the next section. 

III. Large deviation theorem: "average-case" 

BEHAVIOR 

Consider two bounded kernels g defined for A e iRmxfc^ 
corresponding to maximum and minimum squared singular 
values 

.g(A,a) = l{(7i,(A) <a}, and (9) 
g(A,a) = l{f7L(A)<a}. (10) 




0,4 0,8 8 1 1 2 1,4 1 8 



Values d 

Fig. 1. Gaussian measure. Concentration of U-statistic Un{o.) for squared 
singular value cr^jj^ and cj^ax kernels g, see (9) and (To). Shown for m = 
25, k = 2 and two values of n = 25 and 100. 

Note that restricted isometry conditions Q and Q depend on 
both tT^:„ and ct?,.,, behaviors, although the conditions in the 

mm max ' c 

previous StRIP-recovery guarantees Theorem B are explicitly 
imposed only on crj^in. See llT3l . Il32l for the different behaviors 
and implications of these two extremal eigenvalues. In this 
section we consider two U-statistics, corresponding separately 
to © and (Unil. 

Let Ai denote the «-th column of A, and assume Ai to be 
IID. For an bounded kernel g, let p{a) denote the expectation 
Eg{Ag,a), i.e., p{a) — Eg{As,a) for any size-fc subset S. 
Since p{a) = E[/„(a), thus the U-statistic mean E[/„(a) does 
not depend on block length n. 

Theorem 1. Let A be an m x n random matrix, whereby 
the columns Ai are IID. Let g be a bounded bounded kernel 
that maps 1R"'X'= x R ^ lR[o.i] and let p{a) — Eg{As,a) = 
EC/„(a). Let Un{a) be a U-statistic of the sampled realization 
^ = A corresponding to the bounded kernel g. Then almost 
surely when n is sufficiently large, the deviation |J7„(a) — 
p{o)\ l£ En (a) is bounded by an error term en{a) that satisfies 

elia) = 2p(a)(l -p(a)) • (n/k)-' log(n/fc). (11) 

Theorem JT] is shown by piecing together (5.5) in f33l and 
Lemma 2.1 in L34J . The proof is given in AppendixlAl Figure]!] 
empirically illustrates this concentration result for g in (]9]l and 
(]T0l i. corresponding to p{a) = Eg{As,a) = Pr{(T'^^^{As) < 
a} and p{a) — Pr{a^j^{As) < a}. Empirical simulation 
of restricted isometries is very difficult, thus we chose small 
values fc = 2, m = 25 and block lengths n — 25 and n = 100. 
For n = 25 the deviation | U25 (a) — p{a) \ is very noticeable 
for all values of a and both cr^„ and a^:„. However for larger 

I lid A min o 

n = 100, the deviation \ UiQQ{a)—p{a)\ clearly becomes much 
smaller This is predicted by vanishing error e„(a) given in 
Theorem]!] which drops as the ratio n/k increases. In fact if 
k is kept constant then the error behaves as 0{n^^ logn). 

Table ]l] reproduce^ a sample of (asymptotic) estimates for 
both (T^2x and a^^^^ cases, taken from II2TI . These estimates 
are derived for "worst-case" analysis, under assumption that 
every entry Aij of A is IID and Gaussian distributed {i.e., Aij 
is Gaussian with variance 1/m). Table]!] presents the estimates 

^We point out that Bah actually defined two separate restricted isometry 
constants, each corresponding to o-^^^ and cr^ax I21 1- In this paper to 
coincide the presentation with our discussion on squared singular values, their 
results will be discussed in the domain of cr^ ■ and a^,,^ . 
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Fig. 2. Means p{a) = EC/„(a) for predicting the concentration of U„(a). 
Sliown for tlie Gaussian case, (a) m = 50 and (6) m = 150. 



TABLE I 

Asymptotic Lower and Upper Bounds on "Worst-Case" 
Eigenvalues, (iTi 







Minimum: cr^ ■ 
min 


Maximum: crj^ax 


rn/n 


m/n 


0.1 


0.3 


0.5 


0.1 


0.3 


0.5 


1 k/m 1 


O.J 


0.095 


0.118 


0.130 


3.952 


3.610 


3.459 


0.2 


0.015 


0.026 


0.034 


5.587 


4.892 


4.535 


0.3 


0.003 


0.006 


0.010 


6.939 


5.806 


5.361 



accordingj to fixed ratios k/ni and m/n. To compare. Figure 
|2] shows the expectations p{a) ~ EC/„(a). The values p{a) 
are interpreted as fractions, and as. n/k becomes large p(a) 
is approached by C/„(a) within a stipulated error e„. Figure 
|2] is empirically obtained, though note that in Gaussian case 
for p{a) we also have exact expressions 1321 . ESI, and 
the Bartlett decomposition fiS\, available. Again p{a) is a 
marginal quantity {i.e. does not depend on n) and simulation 
is reasonably feasible. In the spirit of non-asymptotics, we 
consider relatively small fc, m values as compared to other 
works 1201 . l2T]l : these adopted values are nevertheless "prac- 
tical", in the sense they come an implementation paper llTl . 

Differences are apparent from comparing "average-case" 
(Figure |2]i and "worst-case" (Table IJi behavior. Consider 
k/m = 0.3 where Table H] shows for all undersampling ratios 
m/n, the worst-case estimate of cr^ji^ is very small, approxi- 
mately 0.01. But for fixed to = 50 and m = 150, Figures|2Ua) 
and (b) show that for respectively k = 0.3 • (150) — 15 and 
k = 45, a large fraction of subsets S seem to have (t^;^($5) 
lying above 0.1. From Table U the estimates for cr^;^ gets 
worse (i.e., gets smaller) as ni/n decreases. But the error 
e„(a) in Theorem [T] vanishes with larger n/k. For the other 
cr^^x case, we similarly observe that the values in Table U also 
appear more "pessimistic". 

We emphasize that Theorem [T] holds regardless of distribu- 
tion. Figure [3] is the counterpart figure for Bernoulli and Uni- 
form cases (i.e., each entry Aij is respectively drawn uniformly 
from {—1/y/m, 1/y/m}, or {a G IR : |a| < ^/3/m}), shown 
for TO = 50. Minute differences are seen when comparing with 
previous Figure |2] For fc = 3, we observe the fraction p{a) 
corresponding to a^^^ to be roughly 0.95 in the latter case, 
whereas in the former we have roughly 0.9 in Figure [U^a), 
and 0.88 in Figure [3];&). 

^The analysis in [21] was performed for the large limit of k,m and n, 
where both k/ni and m/n approach fixed constants. 
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Fig. 3. Means p{a) = E(7„(a) for m = 50 and the (a) Bernoulli and (b) 
Uniform cases. 

Remark 1. Exponential bounds on PT{mms a^^^^{As) < 
1—6} and Pr{maxs a^,^^{As) > l+S}forma.x{5, \Jk/m) < 
y/2 — 1, see employed in "worst-case" analyses, give 
the optimal m = 0{k\og{n/k)) rate, see [Tj, i l72]/ . 071/ . 
However the implicit constants are inherently not too small 
(i.e., these constants cannot be improved). 

These comparisons motivate "average-case" analysis. 
Marked out on Figures |2] and |3] are the ranges for which ct^^^ 
and (j^jj, must lie to apply Theorem A ("worst-case" analysis). 
In the cases shown above, the observations are somewhat 
disappointing - even for small k values, a substantial fraction 
of eigenvalues lie outside of the required range. Thankfully, 
there exist "average-case" guarantees, e.g., previous Theo- 
rems B and C, addressed in the next section. 

IV. U-STATisTics & "Average-case" Recovery 
Guarantees 

A. Counting argument using U-statistics 

Previously we had explained how the invertability condi- 
tions required by Theorems B and C naturally relate to U- 
statistics. We now go on to discuss the other conditions, 
whereby the relationship may not be immediate. We begin 
with the projections conditions, in particular the worst-case 
projections condition. For given we need to upper bound 
the fraction of subsets S, for which there exists at least one 
column <j)j where j ^ S, such that ||$5<^j||oo exceeds some 
value a. To this end, let TZ denote a size- (A; + 1) subset, and 
TZ \ {j} is the size-fc subset excluding the index j. Consider 
the bounded kernel g : lR"'x(fc+i) x R i~> Rjo^ij set as 

^ k+l 

where here TZ = {1, 2, • • • , fc + 1}, and denotes the j-th 
column of A. Consider the U-statistic with bounded kernel 
(O. We claim that 

(n - fc) • Un{a) 

where the summations over TZ and S are over all size-(fc + 1) 
subsets, and all size-fc subsets, respectively. The first equality 
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follows from Definition [T] and ( fTSl i. The second equality 
requires some manipulation. First the coefficient (^') follows 
from the binomial identity (j."]^) ■ {k + 1) — (^) • (n — k). 
Next for some subset S and index j, write the indicator 
1 |||$^^j||oo > a| as Is.j for brevity's sake. By similar 
counting that proves the previous binomial identity, we argue 
EnT,j(^n^n\{3},j = EsEj^s^sj, which then proves 
the claim. Imagine a grid of "pigeon-holes", indexed by pairs 
{S,j), where j ^ S. For each size-(fc + l) subset TZ, we assign 
k + 1 indicators l'ji\[jy j to fc + 1 pairs {S,j). No "pigeon- 
hole" gets assigned more than once. In fact we infer from the 
binomial identity, that every "pigeon-hole" is in fact assigned 
exactly once, and argument is complete. 

Similarly for the small projections condition, we define a 
different bounded kernel g : R^xC^+i) x R IR[o,i] as 



k+l 



=1 i=i 



where 7?. = {1, 2, • • • , fc + 1}, and aj denotes the j-th column 
of A, and /3i,/82, '' t^2'' enumerate all 2'"' unique sign- 
vectors in the set {—1, 1}'^. By similar arguments as before, 
we can show for the U-statistic C/„(a) of $ corresponding to 
the bounded kernel ST3[ satisfies 



(n- k) ■ Un{a) = 



1 



ykJ e=i s jfs 



For indicators ts,j, note that J^js^s '^SJ > 1 if af least one 
indicator satisfying Igj = 1^ and we proved the following. 

Proposition 1. Let Unias) be the U-statistic of corre- 
sponding to the bounded kernel g{A.,a3) in M2\l . Then the 
fraction of subsets S of size-k, for which the worst-case 
projections condition is violated for some € R, is at most 
(n — k) ■ Unia^). Similarly if C/„(a2) corresponds to g(A, 02) 
in ([13i , the fraction sign-subset pairs {P, S), for which the 
small projections condition is violated for some 02 G IR, is at 
most [n — k) ■ [/„(a2). 

Referring back to Theorem B, we point out that the small 
projections condition is more stringent than the worst-case 
projections condition. We mean the following: in the former 
case, the value a2 must be chosen such that 02 < 1; in the 
latter case, the value is allowed to be larger than 1, its size 
only affects the constant 203/(1 — 02) appearing in the error 
estimate \\a*g ~as\\i- In fact if the signal a is fc-sparse, then 
— ci/t||i =0 and the size of 03 is inconsequential, i.e., the 
worst-case projections condition is not required in this special 
case. In this special case, it is best to set 02 = 1 — e for some 
arbitrarily small e. Theorem B is in fact a stronger version 
of Fuchs' early work on Iq / ti-equivalence ||5]. In the same 
respect, Donoho & Tanner also produced early seminal results 
from counting faces of random poly topes ll22l . Il23i . 

Figure |4] shows empirical evidence, where the fc, m, n values 
are inspired by practical system sizes taken from an implemen- 
tation paper [17 |. These experiments consider $ sampled from 
Gaussian matrices A, exactly fc-sparse signals with non-zero 
ai sampled from { — 1, 1}, and uses £1 -minimization recovery 



([TJ. Figure EJ^a) plots simulated (sparsity pattern recovery) 
results for 3 measurement sizes m — 50, 100 and 150 and 
block sizes n > 200 and n < 3000. For example the contour 
marked "0.1", delineates the fc, n values for which recovery 
fails for a 0.1 fraction of (random) sparsity patterns (sign- 
subset pairs {13, S)). We examine the U-statistic ?7„(a2) with 
kernel ( fTSl ), related to the small projections condition. Since 
A has Gaussian distribution, we set 02 = 1 in the kernel 
g(A,a2), as FT{{AlAi)^P = 1} = for any {^,5) and 
j ^ S. Figure 21^6) plots the expectation {n — fc) •p(l), where 
p(l) = E;7„(1) = Eg{ATz, 1) for any size-(fc + 1) subset 7^. 
Again the contour marked "0.1", delineates the fc, n values for 
which (rt — fc)-p(l) ~ 0.1. Here the values p(l) are empirical. 
We observe that both Figures ID^a) and (6) are remarkably 
close for fractions 0.5 and smaller. Figures |4jc) incorporates 
the large deviation error e„ given in Theorem [T| (in doing so, 
we assume n sufficiently large). The bound is still reasonably 
tight for fractions < 0.5. Comparing with recent Donoho 
& Tanners' (also "average-case") results for £1 -recovery (for 
only the noiseless case), taken from |23|. For fractions 0.5 
and 0.01, we observe that for system parameters m = 50 
and n < 1000 (chosen in hardware implementation ifTTl ). 
we do not obtain reasonable predictions. For m — 100, the 
bounds [23] work only for very small block lengths n < 300. 
The only reasonable case here is m = 150, where the 
bounds l23\ perform better than ours only for lengths n < 400 
(i.e.. Figure |4ljc) shows that for n — 300, the large deviation 
bounds predict a 0.01 fraction of size fc = 5 unrecoverable 
sparsity patterns, but |23l predict a 0.01 fraction of size fc = 11 
unrecoverable sparsity patterns). 

The above experiments suggest the deviation error e„(a) 
in Theorem [T| to be over-conservative. Fortunately in the 
next two subsections (pertaining to U-statistics treastise of 
£1 -recovery Theorem B (Section [iV-Bl i. and LASSO recovery 
Theorem C (Subsection llV-CI l), this conservative-ness does not 
show up from a rate standpoint (it only shows up in implicit 
constants). In fact by empirically "adjusting" these constants, 
we find good measurement rate predictions (akin to moving 
from Figure HJc) to (5)). 



B. Rate analysis for £i-recovery (Theorem B) 

In "worst-case" analysis, it is well-known that it is sufficient 
to have measurements m on the order of fclog(n/fc), in order 
to have the restricted isometry constants Sk defined by 
satisfy the conditions in Theorem A. We now go on to show 
that for "average-case", a similar expression for this rate can 
be obtained. To this end we require tail bounds on salient 
quantities. Such bounds have been obtained for the small 
projections condition, see f6l, fT|, ["251, where typically an 
equiprobable distribution is assumed over the sign-vectors fig. 
To our knowledge these techniques were born from consid- 
ering deterministic matrices. Since $ is randomly sampled 
here, we proceed slightly differently (though essentially using 
similar ideas) without requiring this random signal model. For 
simplicity, the bound assumes zero mean matrix entries, either 
i) Gaussian or ii) bounded. 
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(a) ^i-minimization recovery 
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Fig. 4. Gaussian case. Comparing (a) empirical results for ^i-miniinization recovery, (b) mean parameter (n — fc) -^(1) (empirically obtained), and (c) after 
accounting for large deviations (Thm.[TJ. We show cases m = 50, 100 and 150. We also compare with Donoho & Tanners' (DT) large deviation bounds (231. 



Proposition 2. Let A be an m x n random matrix, whereby 
its columns Ai are identically distributed. Assume every entry 
Aij of A has zero mean, i.e., EA.^ = 0. Let every Aij be either 
i) Gaussian with variance 1 /to, or ii) bounded RVs satisfying 
\Aij\ < l/^/m. Let the rows [Aii,Ai2, ■ ■ ■ , Ain] of A be IID. 

Let S be a size-k subset, and let index lj be outside of S, 
i.e., Lu ^ S. Then for any sign vector y3 in {—1, l}*^, we have 



Pr| [A^A^ffi > a| < 2exp 



ma 



2k 



Mal^iAs) < 6} (14) 



for any positive (5 G R. 

Proof: For r G R, let £{t) = {^'^{AlAs)^^ < t} 
where £{t) is an probabilistic event. Let £c{t) denote the 
complementary event. Bound the probability as 

Sir)} 
(15) 



Pv[\iAlA^ffi\>a} 



<Pr| {AlA^f^ 
+ Pr{£,(r)}. 



> a 



We upper bound the first term as follows. Denote constants 
ci,C2,--- ,Cm- For entries {Auj)i of A^j, consider the 
sum I]™iCi • {m'~^Au:)i = ^YT=i^i^i- where RVs 
Xi satisfy Xi = {^\/mAu)i. By standard arguments (see 
Supplementary Material |B]l we have the double-sided bound 



Pr{EIliC,X,| >toO < 
where vector c equals [ci,C2, 
Next write {A^gA^Y P = ( 



2exp(-(TOi)V(2.||c||2)) 

c F 

'-"m\ 

■ P'^Ay){m-^-A^). When 
conditioning on 0^A^g, then -^/ro • is fixed, say equals 

some vector c. Put Xi = {■\/rnAuj)i and X^'s are independent 
(by assumed independence of the rows of A). Then use the 
above bound for Pr {X]i=i (^i^i > 0' set t = a and conclude 



Pr 



< 2 exp I - 



> a 



[may 



2m||r4 



2 exp 



2-pT(^AlAsyfi 

(16) 



where the last equality follows from the identity A^g{A^gY' — 
[A^As)'^ ■ Further conclude that the first term in ( fTSl l is 
bounded by 2 exp(— ma^/(2r)), due to further conditioning 
on the event £{t) = {^^(Ap5)t/3 < r}. 

To bound the second term, let <;max(A) denote the max- 
imum eigenvalue of matrix A. Since ^^As is positive 
semidefinite, note that 0^ {AgAs)^ ^ is upper bounded by 
w((>1^^5 )■!■), which equals k ■ ^((^5^5)^)- Fur- 
thermore ^inax((>ls^5)''') < l/<^^n(^s)' whcrc here Crmin(A) 
is the minimum singular value of A. Thus Pr{£c (''■)} < 
Pr{fc/cr2i„(>l5) > r}. Finally put r = (5fc to get Vy{£c{t)\ < 
PT{aiJAs) < S-^}. m 

Proposition |2] is used as follows. First recall that previous 
Proposition[r| allows us to upper bound the fraction U2 of sign- 
subset pairs {P, S) failing the small projections condition, with 
the (scaled) U-statistic [n — k) ■ Un{a2) with kernel g in (fT3] l 
and \S\ = k. By Theorem[T]the quantity {n— k) ■ Un{a2) con- 
centrates around (n — k) ■ p{a2), where ^(02) = Eg{An, 02), 
where g in (flJl i is defined for size-(fc + 1) subsets TZ. We use 
Proposition |2] to upper estimate ^(02) using the RHS of ( fT4l i. 
Indeed verify that ^(02) = 2"'= Pr{|(A;^A^)^^^| > 03} 
for any S and u ^ S, and the bound (fl4] l holds for any ^ — 0^. 
Now p{a2) is bounded by two terms. By U2 < (n— fc)-[/„(a2), 
thus to have U2 small, we should have the (scaled) first term 
2{n — k) •exp(— ma|(5/(2fc)) of (T4[ to be at most some small 
fraction u. This requires 

TO > const -fc log [ ) (17) 



with const = 2/(a|5) (and we dropped an insignificant 
log 2 term). Next, for m > 2k and S < (0.29)^, we 
can bouncO the second term PT{a^^^{As) < 5} of (fT4l i 

■^For m > 2k, we have PT{aj„i„{A) < c ■ 0.29 - t} < PT{<jj„i„(A) < 
1 — c - ^k/m — t} < exp(— mt^/ci) for some constants c, ci, where A has 
size m X k and with proper column normalization. For simplicity we drop 
the constant c in this paper; one simply needs to add c in appropriate places 
in the exposition. In particular for the Gaussian and Bernoulli cases c = 1, 
and ci = 2 and ci = 16, respectively, see Theorem B, i28l . 
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by exp(— m • (0.29 — v^)^/ci) where ci is some constant, 
see lETI, Theorem 5.39. Roughly speaking, a'ii^iAs) > 0.29 
with "high probabiHty". Figures |2] and [3] (in the previous 
Section HIH i empirically support this fact. Again to have U2 
small the second term of (fl4] i must be small. This requires 
{n — k) ■ exp(— m • (0.29 — v^)^/ci) < u for some small 
fraction u, in which it suffices to have m satisfy ( fTTI ) with 
const = ci/(0.29- v^)2. 

For the invertability condition in Theorem B, we also 
need to upper bound the corresponding fraction ui of size- 
k subsets S. We simply use an U-statistic Un{ai) with kernel 
g{A,ai) = l{(Tniin(A) > ai} for some positive ai (see also 
Theorem C). Here Proposition[T]is not needed. To make p{ai) 
small, where p{ai) = £5(^5,01), use the previous bound 
p(ai) < exp(— m • (0.29 — ai)^/ci), where we set ai — 
with ai < 0.29. Clearly p{ai) cannot exceed some fraction u, 
if m satisfies ( fTTj l with const — ci/(0.29 — aif . 

For the time being consider exactly /c-sparse signals a. 
In this special case the worst-case projections condition in 
Theorem B is superfluous {i.e., with no consequence 03 can 
be arbitrarily big) - only invertability and small projections 
conditions are needed. While we have yet to consider the 
large deviation error e„(a) from Theorem [T] doing so will 
not drastically change the rate. For C/„(a) with kernel g and 
p(a), where p{a) = Eg{A,a), almost surely 

Un{a) <p{a) + en{a) < (p(a))^ + y/2p{a)uj-^\oguj 

< {p{a))^ (1 + V2w-iloga;) (18) 

where the second inequality follows because p{a) < 1, 
and by setting w = n/k. Taking log of the RHS, we 
ob tain (1/2) l ogp(a ) + log(l + \/2uj~^ logw). Note log(l + 
^/2ur^^\oguJ) < log w, since log(l + a) < a holds 

for all positive a. For the small projections condition, bound 
(p(a))2 by the sum of the square-roots of each term in ( fT4] i. 
Then to have W2 < (n — fc) • Unia^) < 2u, it follows similarly 
as before that it suffices that (see Supplementary Material ICt 



(o) Unrecoverable fraction 0.1 



> const -fc 



log 



^2 -(kin) log(n/fc) 



(19) 



with const — max(4/(a2(5), 2ci/(0.29 — v^)^) where we had 
set a/J = oi (we dropped an insignificant log 2 term). For 
invertability condition do the same. To have u\ = Un{ai) < u 
it suffices that m satisfies (19[ with the same const. Observe 
that the term y/2 ■ [k/n] log(n/fc) is at most 1, and vanishes 
with high undersampling (small k/n). Hence ( fTTI i and (fT9] l are 
similar from a rate standpoint. 

We conclude the following: for exactly fc-sparse signals 
the rate (fT9] l suffices to recover at least 1 — 3m fraction of 
sign-subset {P,S) pairs. While const in ( fT9] l must be at 
least 4 (recall that Figure ^c) was somewhat pessimistic), 
for matrices with Gaussian entries we empirically find that 
const is inherently smaller, whereby const « 1.8. This is 
illustrated in Figure |5] for two fractions 0.1 and 0.01 of 
unrecoverable sign-subset pairs. We observe good match with 
simulation results shown in the previous Figure |4ja), and 




100 500 1000 1500 2000 

(6) Unrecoverable fraction 0.01 




200 

150 — 150 
100- 100 



100 500 1000 1500 2000 2500 3000 

Block Length n 

Fig. 5. Measurement rates predicted by equation il9\ . with const taken to 
equal 1.8, required to recover at least 1 — 3tt = 0.9 and 0.99 fractions of sign- 
subset pairs {^,S) (when the signal is exactly fc-sparse), shown respectively 
in (a) and (b). 

quantitie^ {n — k) ■ p{l) plotted in Figure For example, 
m = 150 suffices for a 0.01 fractional recovery failure, for 
n = 300 - 1000 and fc = 6 - 7, and for 0.1 fraction then 
A; = 7 ~ 10. We conjecture possible improvment for const. 

In the more general setting for approximately fc-sparse 
signals, we can also have rate ( fT9l l. To see this, observe 
that Proposition |2] also delivers an exponential bound for the 
worst-case projections condition, see (fT2b . This is because 
— niax£. i<£<2fc |(j4^j4ij)^j8^|, and we take a 
union bound over 2^ terms. Set 03 — a2\/fc, where 02 and 03 
respectively correspond to small projections and invertability 
conditions. Then we proceed similarly as before (see Supple- 
mentary Material [Cl to showQ that the rate for recovering at 
least 1 — 5u fraction of {0,S) pairs suffices to be ( fT9l ). The 
following is the main result summarizing the exposition so far. 

Theorem 2. Let ^ be an m x n matrix, where assume n 
sufficiently large for Theorem [7] to hold. Sample <^ — A 
whereby the entries Aij are IID, and are Gaussian or bounded 
(as stated in Proposition |2]). Then all three conditions in 
£i-recovery guarantee Theorem B for {0,S) with \S\ = k, 
with the invertability condition taken as crmin(^5) > oi 
with fli < 0.29. and with 03 — a\\fk, are satisfied for 
Ml + U2 + U3 = for some small fraction u, if m is on the 
order ofU^ with const = max(4/(aia2)^, 2ci/(0.29-ai)^), 
and ci depends on the distribution of Aij 's. Note const > 4. 

In the exactly k-sparse case where only the first 2 conditions 
are required, this improves to Ui + U2 — 3u. 

We end this subsection with two comments on the rate (fT9l ) 
derived here for "average-case" analysis. Firstly ( fT9] l is very 
similar to that of fclog(n/fc) for "worst-case" analysis. This 

^Comparing U9t and (TT) and the respective expressions for const, 
dropping const from 4 to 1.8 is akin to ignoring the deviation error e,i(a). 
This, and as Figure |4] suggests, the U-statistic "means" (n — fc) ■ p(l) seem 
to predict recovery remarkably well, with similar rates to U9K and inherent 
const smaller than that derived here. 

*We used an assumption that {n — k)/u is suitably larger than 2, see 
Supplementary Material |C] 
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Sparsity k 



Fig. 6. Empirical LASSO recovery performance, BemoulIIi case. In (a) the 
non-zero signal magnitudes \ai\ equal 1, and in (b) they are in R[o,i]- Noise 
variances denoted c^. 



justifies the counting employed in previous Subsection IIV-AI 
Proposition [T] and is reassuring since we know that "worst- 
case" analysis provides the optimal rate fT\, [TTl. Secondly to 
have (fT9] l hold for the approximately fc-sparse case, we lose a 
factor of -s/fc in the error estimate ||aj — as ||i, as compared 
to "worst-case" Theorem A. This is because we need to set 
as — a2Vk, as mentioned in the previous paragraph. However, 
the "average-case" analysis here achieves our primary goal, 
that is to predict well for system sizes k, m, n when "worst- 
case" analysis becomes too pessimistic. 

C. Rate analysis for LASSO (Theorem C) 

Next we move on to the LASSO estimate of |6|. Recall 
from Q that the regularizer depends on the noise standard 
deviation cz, and the term 6'„ = (1 + a)^/2 logn that depends 
on block length n and some non-negative constant a that we 
set. This constant a impacts performance ||6|. For matrices 
with Bernoulli entries. Figure |6] shows recovery failure rates 
for two data sets m — 50, n = 1000 and m = 150, n = 1000; 
the sparsity patterns (sign-subset pairs (/3, S)) were chosen at 
random, and failure rates are shown for various sparsity values 
k, and noises cz- In Figure |6ja) we set a = 0, and in (b) we 
set a = 1. Also, in (a) the non-zero signal magnitudes \ai \ are 
in {1, —1}, and in (b) they are in lR[o.i]- The performances are 
clearly different. "Threshold-like" behavior is seen in (a) for 
both data sets, whereby the performances stay the same for cz 
in the range 5 x 10^^ ^ 1 x 10~^, and then catastrophically 
failing for = 1 x 10~^. However in (6), for various cz the 
performances seem to be limited by a "noise-floor". We see 
that in the noiseless limit (more specifically when cz — > 0), 



the performances become the same. In this subsection, we 
apply U-statistics on the various conditions of Theorem C, 
in particular the invertability and small projections conditions 
have already been discussed in the previous subsection. We 
account for the observations in Figure |6] 

In the noiseless limit, the previously derived rate ( fT9] l 
holds. Here, the regularizer in ^ becomes so small that a 
(equivalently On) does not matter As mentioned in |5 1, LASSO 
then becomes equivalent to £i -minimization hence the 
(noiseless) performances in Figures |6]Ja) and (&) are the same. 
That is, in this special case the rate ( fT9] l suffices to recover at 
least 1 - 3u fraction of To test, take fc = 4, n = 3000, 

and fraction 1 — 3w = l — 6x 10^^, and with const = 1.8 
gives 153, close to m here which is set to 150. 

In the noisy case, we are additionally concerned with 
the noise conditions i) and ii), conditions (|7|i and dHJ, and 
invertability projections. Recall that the noise conditions are 
satisfied with probability 1 — n^^(27rlogn)^2, that goes to 1 
superhnearly O (Proposition ID Supplementary Material |A|i. 
The remaining conditions are influenced by the value a set in 
the On regularization term in (|6]l. 

In condition (|7]), the value a sets the maximal value for 
02 (when a = then 02 < 0.2929, and when a = 1 then 
02 < 0.6464). This affects the small projections condition, to 
which constant 02 belongs, which in turn affects performance. 
However from a rate standpoint ( fT9] l still holds, only now the 
value of const (which has the term A/{a'^5)) becomes larger. 

In condition the value a affects the size of the term 
aj~^ + 203(1 + a). The larger a is, the more often ^ 
fails to satisfy. Here there are two constants oi and 03. 
Recall ai belongs to the invertability condition discussed 
in the previous subsection, which holds with rate ( fT9] l with 
const = 2ci/(0.29 — ai)^ and ai < 0.29. Consider the case 
where the non-zero signal magnitudes \ai\ are independently 
drawn from IR[o,i]- Then we observe (minigg < t with 
probability 1 — (1 — where t e IR[o,i] and \S\ = k. For 
t set equal to the RHS of this gives the probability that 
condition (O fails. Figure ^b) shows good empirical match 
when setting ai = 0.29 and 03 = 1, where the dotted curves 
predict the "error-floors" for various k, measurements m = 50 
and m — 150, and noise cz- In the other case where \ai\ — 1 
(as in Figure |6fa)), condition ^ remains un-violated as long 
as cz (and 01,03,71) allow the RHS to be smaller than 1. 
Figure|6jo) suggests that for the appropriate choices for 01,03, 
condition (O is always un-violated when < 5 x 10~^, and 
violated when cz > 1 x 1^^. For more discussion on noise 
effects see Supplementary Material |D] 

The constant 03 belongs to the remaining invertability 
projections condition. The fraction U3 of size-A: subsets failing 
the invertability projections condition for some 03, can be 
addressed using U-statistics. Consider the bounded kernel 



Mo,i] 



set as 



2" 

5(A,o) = ^^l{(A^A)t^,>a} 



(20) 



where e { — 1,1}'^ and (A-^A)^ is the pseudoinverse 
of A^A. Then M3 — Unias), and as before Theorem [T] 
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guarantees the upper bound ( fTST i, which depends on ^(03) 
where ^(03) = Eg{As,a3). 

We go on to discuss a bound on ^(03) under some gen- 
eral conditions. In ||6l, analysis on ^(03) (see Lemma 3.5) 
requires a^.^^{As) < 1.5, a condition not explicitly required in 
Theorem C. Also, empirical evidence suggests not to assume 
that (T^^{As) < 1.5. For m = 150 and fc = 5 we see 
from Figure |6] that (in the noiseless limit) the failure rate 
is on the order of 1 x 10^^, but in Figure |2j5) we see 
a'^^^{As) > 1.5 occurs with much larger fraction 0.1. Hence 
we take a different approach. Using ideas behind Bauer's 
generalization of Wielandt's inequality |38|, the following 
proposition allows ^^^^(As) to arbitrarily exceed 1.5. Also, it 
does not assume any particular distribution on entries of A. 

Proposition 3. Let S be a size-k subset. Assume k > 2. Let 
As be an kxn random matrix. Let ^min, ^max be some positive 
constants. For any sign vector ^ in { — 1, 1}*^, we have 



> 



(Vfc+l)-|rfe-l| 



< Pr{£c(<5min,'5max)} 



(21) 



where £'((5min, (^max) = {<5min < CTminiAs) < CTmaxl^s) < 

^max}, ond f c(^min, <5max) IS the Complementary event of 



min,'5n,ax), i^nd the Constant Tk satisfies 

Tfc = Tfc((3niax, Ominj = T ' TT ' 

V "min / 1 — fc 2 



(22) 



We defer the proof for now. If AgAg is "almost" an identity 
matrix, then we expect ||(v4^A5)^^/3||oo ~ 1 for any sign 
vector j3 (hence our above hueristic whereby we set 03 = 1). 
Proposition |3] makes a slightly weaker (but relatively general) 
statement. Now for some appropriately fixed (J^ax and (5n,in, 
we expect Pr{£c('5mm, ^max)} in (ISTT l to drop exponentially 
in TO. Just as the term Pi{amm{As) < Smin} in Proposition 
12] can be bounded by exp(— to • (0.29 — (5inin)^/ci), we can 
boun(flPr{a-max(^) > S^ax} < exp(-TO((5n,ax-l-71)^/ci) for 
some (5max > 1-71. Roughly speaking, (Jmax{As) < 1.71 (or 
cr^3x(^s) ^ 2.92) with "high probability". We fix ^min = ai, 
where ai belongs to the invertability condition. 

So to bound ^(03), both (|20] l and Proposition [3] imply 
^(03) < Pr{fc(<5™„,<5max)} fora3 = (Vk + 1) ■ \Tk - l\ / {S^,^^ ■ 
(Tfc + l)). Now Pr{£c((5min,(5max)} < 2 exp(-TO • t^/ci ), where 
we set t = (5niax — 1-71 = 0.29 — oi and (J^in = ai. By ( fTsT l. 
the rate (fT9] l suffices to ensure U3 = Un{aa) < u for some 
fraction u, with the same const. Thus we proved the other 
main theorem, similar to Theorem |2] 

Theorem 3. Let ^ be an m x n matrix, , where assume 
n sufficiently large for Theorem Q] to hold. Sample ^ = A 
whereby the entries Aij are IID, and are Gaussian or bounded 
( as stated in Proposition \2}. Then all three invertability, small 
projections, and invertability projections conditions in LASSO 
Theorem C for (/3,5) with \S\ = k > 2, with ai < 0.29, 
with a2 satisfying for some a set in the regularizer On, 
and with 03 = (\/k + 1) • \Tk - l|/(a? • {tu + 1)) for Tk = 

'For m_> 2k we have Pr{(Tmax(A) > 1.71 + t} < Pr{o-max(A) > 
1 + ^Jk/m + i} < exp(— mt^/ci) for some ci, see |27 j. Theorem 5.39. 



Tfc(1.42 — fli, fli) in i22\l , are satisfied for Ui + U2 + = 4u 
for some small fraction u, if m is on the order of ( I-/9I ) with 
const = max(4/(aia2)^, 2ci/(0.29 — ai)^), and Ci depends 
on the distribution of Aij 's. Note const > 4. 

In the noiseless limit where only the first 2 conditions are 
required, this improves to ui + U2 = 3u. 

Remark 2. We emphasize again that the rate ( 1791 ) is measured 
w.r.t. to the three conditions in Theorem^ The probability for 
which both noise conditions i) and ii) are satisfied, and for 
which condition dSj imposed on min^g^ \ai\ is satisfied, re- 
quire additional consideration. For the former the probability 
is at least 1 — n~^{2Tr\ogn)~2 , see [6J. For the latter, it has 
to be derived based on signal statistics, e.g., for \ai\ € K[o,i] 
then (minig5 \cti\) > t is observed with probability (1 — t) 
with \S\ = k. 

Note that the choice for in Theorem |3] implies 
1 1 (Al^As)^ 0\\aa is roughly on the order ^/k. Indeed this is true 
since r^. > 1, and we note Tk = ((5max/<5min)'^+2fc^2 +o(fc~2), 
thus Tk ~ ((5max/'5min)^ for moderate k. Now LASSO recovery 
also depends on the probability that condition ^ holds. Our 
choice for 03 causes the RHS of ^ to be roughly of the order 
cz\/2k log n. Compare this to IS) (see Theorem 1.3) where it 
was assumed that (TminC-As) < 1.5, they only require 03 = 3, 
i.e., a factor of \fk is lost without this assumption (which 
was previously argued to be fairly restrictive). To improve 
Proposition |3] one might additionally assume some specific 
distributions on A. We leave further improvements to future 
work. 

Proof of Proposition |5} For notational convenience, put 
X = {A^gAs)'^ ■ Bound the probability 

Pr{||X;8|U >a%/fc} <Pr{||X^||oo >a%/fc|f(5,nm,<5n,ax)} 

+ Pr{f,(5,„„,,5„ax)}. (23) 



where we take a to mean 



\Tk 



II 1 



Tfe 



1 <SJ^s) 



(24) 



for Tk chosen as in (1221) . We claim that every entry {Xl3)i 
of is upper bounded by aVk, for a as in (l24l i. Then by 
definition of E{dmin, ^max), the first term in (|23] | equals and 
we would have proven the bound jTH . 

Let C denote a fc x 2 matrix. The first column C is be a 
normalized version of more specifically it equals k^^p.^. 
The second column equals the canonical basis vector c^, where 
Ci is a 0-1 vector whereby {ci)j = 1 if and only if j — i. 
Consider the 2 x 2 matrix X' that satisfies X' = C^XC. 
This matrix X' is symmetric (from symmetry of X) and 
k~^{X^)i ~ X'12 ^ ^21 (from our construction of C). 
That is the entry X[ 2 (and X2 1) of X', correspond to the 
(scaled) quantity k^^{XP)i that we want to bound. 

Condition on the event £c{Smim^mRx)^ then Ag has rank fc 
and therefore X = {A^Ag)'^ = {A^AsY^. Let det(-) and 
Tr( ) denote determinant and trace. As in ll38l equation (11), 
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we have 

-'^1,1"'^2,2 



4det(X') 



(Tr(X'))2 - [Xl, X'^^.f 



- (Tr(X'))2 " Jl + W 

where t w(^')/?min(-^') and w and ^min respectively 
denote the maximum and minimum eigenvalues. Now t ~ 
wl^O/^minl^') > 1. If t = 1 then 4t/(l + tf = 1, and 
for t > 1 the function At/{1 + 1)"^ decreases monotonically. 
We claim that in ( |22] | upper bounds <;max(-X^')/'^min(-X^')' and 
then allows us to produce the following upper bound 



|-'^l,2l < \h^l.l^2.2 • 1 



\rk-l\ 



(26) 



Bound {X[ ^X2 2)^ by the maximum eigenvalue <rmax(-X^') of 
X'. Then, further bound w(^') by (1 + '2 ) / al^^iAs) , 
which gives the form (l24l i. This bound is argued as follows. 
For fc > 2, we have the columns in C to be linearly 
independent. Since X' — C^XC and X is positive definite, 
it is then clear that ';max(^') < ^max(C^C) • <rmax(^)- Now 
C^C is a 2 X 2 matrix with diagonal elements 1, and off- 
diagonal elements ±l/Vk. Hence QTiax(C"^C) = 1 + fc^2. 
Also QwdxiX) < l/(j^j^{As), and the bound follows. 

To finish, we show the claim > ?max(-X^')/'^min(-X^')- 
similar arguments as above, it follows that 

w(^') / w(c^c) w(x) i + k-i <AAs) 



< 



w(X') - w(C^C) l-k-h ol^{As) 

since ^min(^') > ^min(C^C) • Tniinl^), and ^niin(^') = 1 - 

A:~2, and X = {AgAs)^'^- We are done. ■ 

V. Conclusion 

We take a first look at U-statistical theory for predicting 
the "average-case" behavior of salient CS matrix parameters. 
Leveraging on the generality of this theory, we consider 
two different recovery algorithms i) ^i-minimization and ii) 
LASSO. The developed analysis is observed to have good po- 
tential for predicting CS recovery, and compares well (empiri- 
cally) with Donoho & Tanner |23^1 recent "average-case" anal- 
ysis for system sizes found in implementations. Measurement 
rates that incorporate fractional u failure rates, are derived 
to be on the order of A:[log((n — fc) /u) + ^2(fc/n) log(n/fc)], 
similar to the known optimal fc log(n/fc) rate. Empirical obser- 
vations suggest possible improvement for const (as opposed to 
typical "worst-case" analyses whereby implicit constants are 
known to be inherently large). 

There are multiple directions for future work. Firstly while 
restrictive maximum eigenvalue assumptions are avoided (as 
StRIP-recovery does not require them), the applied techniques 
could be fine-tuned. It is desirable to overcome the \/k losses 
observed here for noisy conditions. Secondly, it is interesting 
to further leverage the general U-statistical techniques to 
other different recovery algorithms, to try and obtain their 
good "average-case" analyses. Finally, one might consider 



similar U-statistical "average-case" analyses for the case where 
the sampling matrix columns are dependent, which requires 
appropriate extensions of Theorem [T] 
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Appendix 

A. Proof of Theorem |7] 

For notational simplicity we shall henceforth drop explicit 
dependence on a from all three quantities Un{a),p{a) and 
g{A, a) in this appendix subsection. While Un is made explicit 
in Definition [T] as a statistic corresponding to the realization 
^ = A, this proof considers [/„ consisting of random terms 
g{As) for purposes of making probabalistic estimates. Theo- 
rem [U is really a law of large numbers result. However even 
when the columns Ai are assumed to be IID, the terms g{As) 
in Un depend on each other. As such, the usual techniques 
for IID sequences do not apply. Aside from large deviation 
results such as Thm. [T] there exist strong law results, see ||39l . 
The following proof is obtained by combining ideas taken 
from II33II and 1341 . We use the following new notation just 
in this subsection of the appendix. Partition the index set 
{1, 2, • • • ,n} into cj„ — [n/fcj subsets denoted Si each of size 
fc, and a single subset TZ of size at most fc. More specifically, 
let 5., = {(i - 1) • fc + 1, (i - 1) • fc + 2, • • • , i • fc} and let 
TZ — {[n/fcJ -fc+l, [?i/fcj •fc + 2, • • • ,n}. Let tt denote a per- 
mutation (bijective) mapping {1, 2, • • • ,n} {1, 2, • • • , n}. 
The notation tt{S) denotes the set of all images of each 
element in S, under the mapping tt. Following Section 5c 
in ||33]| we express the U-statistic [/„ of A in the form 



Un 



7r \ i—1 



(27) 



the first summation taken over all n\ possible permutations 
TT of {1,2,- •• ,n}. To verify, observe that any subset S is 
counted exactly aj„ • fc!(n — fc)! times in the RHS of ( |27] l. 

Recall p — Eg{As) — E?7n- From the theorem statement 
let the term e^j equal cp{l— p) ■ uj~^ log u;„ where c > 2. We 
show that the probabilities Pr{|?7„ — p\> for each n are 
small. For brevity, we shall only explicitly treat the upper tail 
probability Pr{CX,i — p > where standard modifications 
of the below arguments will address the lower tail probability 
Pr{ — [/„ + p > £„} (see comment in p. 1, ll33l ). Using the 
expression ( l27T i for Un, write the probability Pr{J7„— p > £«} 
for any ft, > as 

Pr{C/„-p>e„} < Eexp{h{Un-p + €n)) 

= Eexp ^-^ ^^/i(S'^ -p + e„)^^ , 

where here S'^ is a RV that equals the inner summation in 
(|27ll, i.e. 5^ ij^I]r=i5(^7r(s,))- Using convexity of the 
function cxp(-) we express 

Pr{C/„-p>e„} < ^^Ecxp(/i(S'^ -p + e„))- 
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Now observe that the RV S'tt is an average of a;„ IID terms 
(7(^4^(5.)). This is due to the assumption that the columns 
Ai of A are IID, and also due to the fact that the sets 
7r{Si) are disjoint (recall sets Si are disjoint). Hence for any 
permutation tt, by this independence we have Eexp (/iS'tt) = 
(Eexp(/i'-g(j4^(5j))))"", where the normalization h' — h/ujn 
bears no consequence. The RV g(j4jr(5j)) is bounded, i.e. 
< < 1, and its expectation £9(^^(5^-)) equals 

p. By convexity of cxp(-) again and for all h > 0, the 
inequality e'*" < e'^a + 1 — a holds for all < a < 1. 
Therefore putting a = glAy^i^g-^)) we get the inequality 
exp{h-g{A„(^Si))) < l + (e''-l)-.g(yl^(5i)). By the irrelevance 
of TT in previous arguments, by putting Eg(A7r(5i)) ^ P 

Pr{C/„ - p > €„} < e-''(^"+P) {1-P + pe'^Y" . 

We optimize the bound by putting pe^ — ( 1 — p) (p + e„ ) / ( 1 — 
p— e„), see (4.7) in |33|, to get 

Pr{[/„ - p > £„} 

< ((l + e„p-i)f+^"(l-e„(l-p)-i)i-f-")-"". (28) 

Following (2.20) in [34] we use the relation log(l + a) = 
a— ia^+o(Q;^) as a — > 0, to express the logarithmic exponent 
on the RHS of ^ as 

-a;„e^(l + o(l)) 
2p(l-p) 

Therefore by the form e\ = cp(l — p) • aj~^logaj„ where 
c > 2, for sufficiently large n we have 

Pr{{/„-p>e„}<w-^/2<<i 

which in turn implies X^j^fc Pi'l^Ai — P > Cn} < 00. 
Repeating similar arguments for the lower tail probability 
Pr{-[/n +p > £„}, we eventually prove J2'^=k Pr{|C/„-p| > 
£„} < 00 which implies the claim. 
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Supplementary Material 

A. Proofs of StRIP-type recovery guarantees appearing in 
Subsection \n-B\ 

In this part of the appendix we provide the proofs for the 
two StRIP-type recovery guarantees discussed in this paper 
The following are proofs for Theorems B and C. 

Proof of Theorem B, cf, Lemma 3, [7 J: Define e G R" 
as e = a* — a, i.e., e is the recovery error vector. The proof 
technique closely follows that of Theorem 1.2, cf, f29\. Since 
sgn(as) — we have the inequality 

\\{a + e)s\\i>\\as\\i+P^es. (29) 

Since a* solves ([T), hence | |a* 1 1 1 < | jo: ||i. Putting a* — a+e, 
we have 

||a||i > ||a + c||i = ||(a+e)5||i + ||(Q: + e)5j|i 

> I las 111 +P^es + llesJIi - HasJIi, 

(30) 

where the last step follows the inequality ( |29l ). and the 
triangular inequality. Re-arranging and putting ||q(5^||i = 
||a||i - lla^lli we get 

lle^Jli < -/S^e5 + 2||a5j|i. (31) 

We next bound the term —0^es with \P'^es\, and for now 
assume that the following claim holds 

w'^^s\<\\p'^^l^s^\oo■\\es^\l. (32) 

We then proceed to show the bound on ||c5^||i (or Ha^ — 
as 111) to complete the first part of the proof. Using the small 
projections condition, bound ||/8-'"$^$5^||oo < 02 using some 
a2 < 1. This gives a upper bound of a2 • ||c5^||i on l^^'^Csl 
in ( |32] |. Finally use this in dsTl i get He^Jli < a2||c5^||i + 
2||a5^||i, or equivalendy ||c5^||i < 2/(1 — 02) • ||a5^||i. To 
show the claim (|32] |. note that e is in the null-space of 
i.e. $e = 0, or equivalently, $565 — — $5^65^. Let I denote 
the size-fc identity matrix. By the invertability condition, the 
pseudoinverse $g satisfies $^$5 ~ I. Hence 

€5 = ^l^s.es^, (33) 
and take the vector inner product with on both sides to obtain 
fi^es = —^^^g^Sa^Sa - Finally ( [32] i holds by taking absolute 
value of and writing \j3'^^l^s,£sA < ll/3^*t;*5elloo • 

lle^Jli. 

To second part is to elucidate the bound on He^Hi (or 
lla^^ — a^^lli). Starting from the previous relationship ( l33b 
we have He^Hi = ||*^$5^e5j|i < ||$^$5j|oo • IksJIi. 
The result then follows by using the worst-case projections 
condition to bound H^^^sJIoo by some positive 03, and also 
bounding He^Jli using the bound obtained in the first part of 
this proof. ■ 

For the next two proofs we use the following notation. Let 
I denote the identity matrix, and let P denote a projection 
matrix onto the column subspace of $5, i.e., P = $5$^. We 
first address the proof of Proposition |4] 

Proposition 4 (cf, [i6|). Let Z be a random noise vector, 
whose components are IID zero mean Gaussian with vari- 
ance c\. Assume that the matrix $ satisfies \\4>i\\2 — 1 



for all columns <j)i. Then the realization ^ = z satisfies 
conditions i) and ii) in Theorem C with probability at least 
1 — n^-^(27rlogn)^2. 

Proof of Proposition^ c.f, 1^: The result will follow by 
showing i) holds with probability k-n^"^ {2n log n) ^ 2 , and by 
showing ii) holds with probability {n — k) ■ n^^(27rlogn)^2. 

For i), first assume each component of Z has variance 1. 
Let Ci denote the i-th row of thus we have 

\\{^g^s)~^^sZ\\oo = maxi|cfZ|. Since Z is Gaussian, 
thus 

Pr{||(*^$5)"'*5^||oo >z}<k- Py{\Z\ > z}, (34) 

where Z is a Gaussian RV with standard deviation at least 
the ^2-norm of any row c,. It remains to then upper bound 
||ci||2 for all i, which follows as ||c,;||2 < ||($5*5)"^*5||2- 
The spectral norm ||($5$5)^^$5||2 is at most the recip- 
rocal of the smallest non-zero singular value of $5, and 
by the invertability condition for some positive oi, we have 
||($^*5)~i$5||2 < ar^- Then we let Z in (|34]| have standard 
deviation aj"^. Equivalently, 

Pr{||($^*5)"'*5^||oc >z}<k- Pr{|Z| > ai • z} 

<2k- fz{aiz)/{aiz) (35) 

where Z is a standard normal RV with density function 
fz{z)- GeneraUzing to the case where each component 
of Z has variance cz, the upper bound becomes 2k ■ 
fziiaiz)/cz)/{{aiz)/cz). Put z = (cz\/21ogn)/ai to get 
the claimed probabilistic upper estimate k ■ n~^(27rlogn)^2 . 

For ii) we proceed similarly. Observe that for any i ^ S,we 
have (I - P)||2 < \\M2 = 1. Then put z = cz2yp^ 
in case ii) to get the claimed probabilistic upper estimate (n — 
fc) • n~^(27rlogn)^2. ■ 

Proof of Theorem 1.3, c.f, [6 J: We shall show that any 
signal a with sign and support S, assuming S) satisfy 
all three invertability, small projections, and invertability pro- 
jections conditions together with (|7]l and (O, will have both 
sign and support successfully recovered. 

The proof follows by constructing a vector a' from a as 
follows. Let e denote the error e = a' — a, and a' is defined 
by letting e satisfy 



€s = (*i$5)-'($?z-2czM), 



(36) 



Let us first claim that if (|8]l holds, then the support of a' equals 
that of a. If this is true, then standard subgradient arguments, 
see 161, 13J I, will lead us to conclude that a' must be the 
unique Lasso (|6]l solution (i.e., a' = a*) if i) it satisfies 



<Af(b-#a') = 2czen-sgn{a',), if leS, 
|(Af(b-$a')| < 2cz0n, if^^5, 



(37) 



and ii) the submatrix $5 has full column rank. The condition 
ii) follows from the invertability condition, and the latter half 
of the proof will verify i). Let us first verify the previous claim 
that both a' and a have exact same supports. In fact, we go 
further to verify that a' and a also have the same signs. First 



check 



< 
< 



^2 log n 



«! ^cz 



, + ^e^cz 
2a^cz ■ Or, 



(38) 



where the final inequality follows from noise condition i) from 
PropositionlH and the invertability projections condition which 
provides the bound ||($g$5)^^;9||oo < 03 for some positive 
03. By assumption dHJ and comparing with the above upper 
estimate for He^Hoo, our claim must hold. 

Next we go on to verify a' satisfies dJTl ). We have 

b - $a' = z - #€ = z - ($t;)^ ($5Z - 2cz6'« • fi) (39) 

where the last equality follows by first writing $e = $€5, 
then substituting ( l36b , and putting ~ ($^$5)^^$^. Now 
because is a right inverse of by left multiplying the 
above expression by we conclude 

$;^(b-*a') = 2cze„-/3, 

which is equivalent to the first set of equations of ( |36] | as 
we verified before that y3 = sgn(Q;^). For the second set of 
equations, observe from (|39T l that 



(I-P)(b-$a*) 



(I-P)z, 



P(b-$a*) = 2czen-{^lfP. 

where the first equality follows because (I — P)($^)-^ — 
0, and the second equality follows because P($^)^$^ = 
PP-^ = P^ = P. Using the above two identities, we estimate 

||$^^(b-$a*)|U 

< 11$^^ (I - P)(b - $«')lloo + 11*^ P(b - $a')|U 
- P)z|U + 2cz0„ • ||*:^^(*^)^^||oo 



cz2/2e^ 

1 + a 



(40) 



where the upper estimate {czV29n)/{l + a) = cz2y/Togn 
follows from noise condition ii) stated in Proposition 21 and 
1 1^5 {^s)'^^\\oo < 0-2 follows from the small projections 
property. Finally from assuming (|7]) we have y/2{l + a)~^ + 
2a2 < 2, and applying to the last member of (|40] | proves 
11$^ (b — $a')||oo < 2czdn, which verifies a' satisfies the 
second set of equations of ( |36] |. Thus we verified a' = a* 
which is what we need to complete the proof. ■ 



B. Derivation of standard bounds 

In the Gaussian case note EXf — 1 and EXi = 0. 
Then X^I^li CiXi is also Gaussian with variance ||c||2. Hence 
by Markov's inequality we have the (single-sided) inequal- 
ity PT{T,T=lCiX^>t} < eM-ht + h^/\\c\\l) for any 
h > 0. The claim for the Gaussian case will follow by 
setting h = i-||c|||/2, and noting that for the other 
side Pr{-{J2T=i c^X,) >t} = Pv{J2T=i c^X^ > t}. For the 
bounded case, note \Xi\ < 1 and EXi — 0, and the claim 
follows from Hoeffding's (2.6) in 



C. Derivation of measurement rates 

For the small projections condition, start from p{a2) being 
bounded by the RHS of (fl4l i where a = 02- As before bound 
Prlfminl-^s) < fli} < exp(— m • (0.29 — ai)^/ci), where we 
had set = ai. From the identity ^foi < ^Ja^ + -y/oa for 
positive quantities a^, it follows from Theorem[T]and (fTsT l that 
we will have U2 < {n ~ k) ■ Unias) < 2u, if we enforce 

- log2 + log(n-fc) ^ ^ +i < 



log(n — fc) 



2fc 

m(0.29 - ai)2l 

Cl 



logu, 
< log u, 



where t — ^J2{k/n) \og{n/k). Ignoring the log 2 term, and 
using \/n — k < n — k, it follows that iT% enforces the two 
above conditions. 

Similarly for the invertability condition, to have ui = 
Un{ai) < it follows from Theorem [T] and ( fTSl ) that we 
need to enforce to second condition above. 

For the worst-case projections condition, to have 1*3 < (n — 
k) ■ {7,1(03) < 2u we need to enforce 

m(aia3)2 



(fc + 1) • log2 + log(n - k) 

k log 2 + log(n — fc) — 
Taking 



2fc 

m(0.29-ai)2 



Cl 



+ t < 



+ t < 



logu, 
logu. 



fc log 



n — k 



> (fc + 1) •log2 + log 



fc 



justifiable for {n — k)/u suitably larger than 2, the rate ( fT9] l 
generously suffices to ensure these 2 conditions. 



In (&) and (c) we look at the other case where \ai\ G IR[o,i]- 
Here (c) plots the probabiHty 1 - (1 - 1)'^ that dS) fails. Again 
the contoured lines delineate a particular fixed value of 1 — ( 1 — 
t)'' for various fc, n values, whereby we set t = 7A-cz\/2\ogn 
(recall we used a = 1 here). We observe how closely (c) 
tracks the noise floor regions in (&) (indicated by shading). 
More specifically note t really depends on n, and the larger 
the probabilities 1 — (1 — t)'^ get for various fc, n in Figure 
ID.lf c). this probability overwhelms the LASSO recovery rates 
in Figure lDJT 6). This matches with our previous observations 
in Figure |6l[6). 



D. More on noisy LASSO performance 

The aim here is to provide more empirical evidence to 
support observations made in Figure|6]for more block lengths. 
Here Figure [DTI shows LASSO performance now for a wider 
range of n. We only consider m — 150, and show various 
recovery failure rates displayed via contoured lines, for various 
sparsities fc and block lengths n. Figures ID.lf a) and (6) are 
companion to Figures |6fa) and (fe), in that they respectively 
correspond to cases where the non-zero signal magnitudes \ai\ 
equal 1 (and a = 0), and in lR[o.i] (and a = 1). That is, for 
n — 1000, and fc = 4 and cz = 1 x 10"'', we see the recovery 
failure is approximately 1 x lO^'^ in both Figure ID.ir a) and 
Figure |6ja). 

As mentioned in Subsection IIV-CI we observe good empir- 
ical match when adjusting the term t = (oj^^ + 203(1 + a)) ■ 
cz^/2 logn (on the RHS of ^) with ai = 0.29 and 03 = 1. 
Figure iPTI provides further support. In (a) we show the values 
of the term t for values n = 300 and n = 3000. Recall in 
this case when t > 1 condition (O (and thus recovery) fails. 
Observe when cz = 5 x 10^^ the values of t are very close 
to 1, and for = 1 x 10^^ they exceed 1. This matches 
with our observation in Figure |6ja) that = 5 x 10~^ is 
the critical point, beyond which for large cz recovery fails 
catastrophically. 
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Fig. D.l. Empirical LASSO performance shown for m = 150 for range of k, n values. In (a) the non-zero signal magnitudes \oLi\ equal 1, and in (b) 
are in R[o,i] ■ Ii^ (c) we plot a curve (expression) 1 — (1 — t)^ for t = (3.4 + 2(1 + a)) • czy/^ log n. 



